Bioconductor AnnotationData Packages: http://www.bioconductor.org/packages/release/data/annotation/
AnnotationHub:: https://bioconductor.org/packages/AnnotationHub/
License: GPL-3.0
There are many organism-level (org) packages readily available on Bioconductor. They provide mappings between a central identifier (e.g. Entrez Gene identifiers) and other identifiers (e.g. ensembl ID, Refseq Identifiers, GO Identifiers, etc).
The name of an org package is always of the form org.<Sp>.<id>.db (e.g. org.Hs.eg.db) where <Sp> is a 2-letter abbreviation of the organism (e.g. Hs for Homo sapiens) and <id> is an abbreviation (in lower-case) describing the type of central identifier (e.g. eg for gene identifiers assigned by the Entrez Gene, or sgd for Saccharomyces Genome Database). Most of the Bioconductor annotation packages are updated every 6 months.
Rcd /ngs/GO-Enrichment-Analysis-Demo
R
BiocManagerList available organism-level packages for installation in BiocManager.
## [1] "org.Ag.eg.db" "org.At.tair.db" "org.Bt.eg.db" "org.Ce.eg.db"
## [5] "org.Cf.eg.db" "org.Dm.eg.db" "org.Dr.eg.db" "org.EcK12.eg.db"
## [9] "org.EcSakai.eg.db" "org.Gg.eg.db" "org.Hs.eg.db" "org.Mm.eg.db"
## [13] "org.Mmu.eg.db" "org.Mxanthus.db" "org.Pf.plasmo.db" "org.Pt.eg.db"
## [17] "org.Rn.eg.db" "org.Sc.sgd.db" "org.Ss.eg.db" "org.Xl.eg.db"
org packageAs an example, let’s download and install the Arabidopsis thaliana (thale cress) package.
## Bioconductor version 3.11 (BiocManager 1.30.10), R 4.0.2 (2020-06-22)
## Installing package(s) 'org.At.tair.db'
## Updating HTML index of packages in '.Library'
## Making 'packages.html' ... done
## Old packages: 'cpp11', 'ps'
## Loading required package: AnnotationDbi
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
##
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport,
## clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply,
## parSapply, parSapplyLB
##
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
##
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname,
## do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect,
## is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int,
## pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff,
## sort, table, tapply, union, unique, unsplit, which, which.max, which.min
##
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with 'browseVignettes()'. To
## cite Bioconductor, see 'citation("Biobase")', and for packages
## 'citation("pkgname")'.
##
## Loading required package: IRanges
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
##
## The following object is masked from 'package:base':
##
## expand.grid
## OrgDb object:
## | DBSCHEMAVERSION: 2.1
## | Db type: OrgDb
## | Supporting package: AnnotationDbi
## | DBSCHEMA: ARABIDOPSIS_DB
## | ORGANISM: Arabidopsis thaliana
## | SPECIES: Arabidopsis
## | TAIRSOURCENAME: Tair
## | TAIRSOURCEDATE: 2020-Apr01
## | TAIRSOURCEURL: https://www.arabidopsis.org/
## | TAIRGOURL: https://www.arabidopsis.org/download_files/GO_and_PO_Annotations/Gene_Ontology_Annotations/ATH_GO_GOSLIM.txt
## | TAIRGENEURL: https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_functional_descriptions
## | TAIRSYMBOLURL: https://www.arabidopsis.org/download_files/Public_Data_Releases/TAIR_Data_20190331/gene_aliases_20190402.txt.gz
## | TAIRPATHURL: ftp://ftp.plantcyc.org/Pathways/Data_dumps/PMN14_January2020/pathways/Ara_pathways.20200125
## | TAIRPMIDURL: https://www.arabidopsis.org/download_files/Public_Data_Releases/TAIR_Data_20190331/Locus_Published_20190402.txt.gz
## | TAIRCHRURL: https://www.arabidopsis.org/download_files/Maps/seqviewer_data/sv_gene.data
## | TAIRATHURL: https://www.arabidopsis.org/download_files/Microarrays/Affymetrix/affy_ATH1_array_elements-2010-12-20.txt
## | TAIRAGURL: https://www.arabidopsis.org/download_files/Microarrays/Affymetrix/affy_AG_array_elements-2010-12-20.txt
## | CENTRALID: TAIR
## | TAXID: 3702
## | KEGGSOURCENAME: KEGG GENOME
## | KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
## | KEGGSOURCEDATE: 2011-Mar15
## | GOSOURCENAME: Gene Ontology
## | GOSOURCEURL: http://current.geneontology.org/ontology/go-basic.obo
## | GOSOURCEDATE: 2020-05-02
## | GOEGSOURCEDATE: 2019-Jul10
## | GOEGSOURCENAME: Entrez Gene
## | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
## | EGSOURCEDATE: 2019-Jul10
## | EGSOURCENAME: Entrez Gene
## | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
##
## Please see: help('select') for usage information
AnnotationHubAbove method returns a limited number of organism-level annotation packages. There are a lot more packages available from the Bioconductor’s AnnotationHub service.
To search, download and install packages from the AnnotationHub service, install AnnotationHub if it is not yet installed in your machine.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("AnnotationHub")## Bioconductor version 3.11 (BiocManager 1.30.10), R 4.0.2 (2020-06-22)
## Installing package(s) 'AnnotationHub'
## Updating HTML index of packages in '.Library'
## Making 'packages.html' ... done
## Old packages: 'cpp11', 'ps'
AnnotationHub object## Loading required package: BiocFileCache
## Loading required package: dbplyr
##
## Attaching package: 'AnnotationHub'
## The following object is masked from 'package:Biobase':
##
## cache
## using temporary cache /tmp/RtmpIaAhO4/BiocFileCache
## snapshotDate(): 2020-04-27
## [1] "https://annotationhub.bioconductor.org"
## AnnotationHub with 50277 records
## # snapshotDate(): 2020-04-27
## # $dataprovider: BroadInstitute, Ensembl, UCSC, ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/,...
## # $species: Homo sapiens, Mus musculus, Drosophila melanogaster, Bos taurus, Pan trogl...
## # $rdataclass: GRanges, BigWigFile, TwoBitFile, Rle, EnsDb, OrgDb, ChainFile, TxDb, In...
## # additional mcols(): taxonomyid, genome, description, coordinate_1_based,
## # maintainer, rdatadateadded, preparerclass, tags, rdatapath, sourceurl,
## # sourcetype
## # retrieve records with, e.g., 'object[["AH5012"]]'
##
## title
## AH5012 | Chromosome Band
## AH5013 | STS Markers
## AH5014 | FISH Clones
## AH5015 | Recomb Rate
## AH5016 | ENCODE Pilot
## ... ...
## AH83110 | Zonotrichia_albicollis.Zonotrichia_albicollis-1.0.1.ncrna.2bit
## AH83111 | Zosterops_lateralis_melanops.ASM128173v1.cdna.all.2bit
## AH83112 | Zosterops_lateralis_melanops.ASM128173v1.dna_rm.toplevel.2bit
## AH83113 | Zosterops_lateralis_melanops.ASM128173v1.dna_sm.toplevel.2bit
## AH83114 | Zosterops_lateralis_melanops.ASM128173v1.ncrna.2bit
## [1] 50277
org recordsSearch for organism-level packages with a pattern-matching string “^org\\.”.
## [1] "DFrame"
## attr(,"package")
## [1] "S4Vectors"
Show query results stored in DFrame.
## [,1]
## [1,] "title"
## [2,] "dataprovider"
## [3,] "species"
## [4,] "taxonomyid"
## [5,] "genome"
## [6,] "description"
## [7,] "coordinate_1_based"
## [8,] "maintainer"
## [9,] "rdatadateadded"
## [10,] "preparerclass"
## [11,] "tags"
## [12,] "rdataclass"
## [13,] "rdatapath"
## [14,] "sourceurl"
## [15,] "sourcetype"
## [1] 1480
## DataFrame with 1480 rows and 2 columns
## title species
## <character> <character>
## AH79568 org.Ag.eg.db.sqlite Anopheles gambiae
## AH79569 org.At.tair.db.sqlite Arabidopsis thaliana
## AH79570 org.Bt.eg.db.sqlite Bos taurus
## AH79571 org.Cf.eg.db.sqlite Canis familiaris
## AH79572 org.Gg.eg.db.sqlite Gallus gallus
## ... ... ...
## AH81959 org.Bathycoccus_prasinos.eg.sqlite Bathycoccus prasinos
## AH81960 org.Kwoniella_pini_CBS_10737.eg.sqlite Kwoniella pini_CBS_10737
## AH81961 org.Burkholderia_cepacia_ATCC_25416.eg.sqlite Burkholderia cepacia_ATCC_25416
## AH81962 org.Burkholderia_cepacia_DSM_7288.eg.sqlite Burkholderia cepacia_DSM_7288
## AH81963 org.Burkholderia_cepacia_LMG_1222.eg.sqlite Burkholderia cepacia_LMG_1222
org packageLet’s search and install the Felis catus (cat) package.
# Search df with keyword
data.table::as.data.table(df[,c("title", "species")], keep.rownames = TRUE)[grep("Felis", species)]## rn title species
## 1: AH80647 org.Felis_catus.eg.sqlite Felis catus
## 2: AH80648 org.Felis_domesticus.eg.sqlite Felis domesticus
## 3: AH80649 org.Felis_silvestris_catus.eg.sqlite Felis silvestris_catus
## 4: AH80906 org.Felis_canadensis.eg.sqlite Felis canadensis
## 5: AH81162 org.Felis_concolor.eg.sqlite Felis concolor
## downloading 1 resources
## retrieving 1 resource
## loading from cache
## OrgDb object:
## | DBSCHEMAVERSION: 2.1
## | DBSCHEMA: NOSCHEMA_DB
## | ORGANISM: Felis catus
## | SPECIES: Felis catus
## | CENTRALID: GID
## | Taxonomy ID: 9685
## | Db type: OrgDb
## | Supporting package: AnnotationDbi
##
## Please see: help('select') for usage information
## record status dateadded
## 1 AH80647 Public 2020-04-27
After retrieving an annotation package, it will be placed in the local AnnotationHub cache. You can used it again without having to download the package.
## [1] "/home/ihsuan/.cache/AnnotationHub"
## loading from cache
org db objectscolumnsShows which kinds of data can be returned for the AnnotationDb object.
Both objects contain Gene Ontology mapping information.
## [1] "ARACYC" "ARACYCENZYME" "ENTREZID" "ENZYME" "EVIDENCE"
## [6] "EVIDENCEALL" "GENENAME" "GO" "GOALL" "ONTOLOGY"
## [11] "ONTOLOGYALL" "PATH" "PMID" "REFSEQ" "SYMBOL"
## [16] "TAIR"
## [1] "ACCNUM" "ALIAS" "CHR" "ENSEMBL" "ENTREZID" "EVIDENCE"
## [7] "EVIDENCEALL" "GENENAME" "GID" "GO" "GOALL" "ONTOLOGY"
## [13] "ONTOLOGYALL" "PMID" "REFSEQ" "SYMBOL"
keytypesShows which columns can be used as keys.
## [1] "ARACYC" "ARACYCENZYME" "ENTREZID" "ENZYME" "EVIDENCE"
## [6] "EVIDENCEALL" "GENENAME" "GO" "GOALL" "ONTOLOGY"
## [11] "ONTOLOGYALL" "PATH" "PMID" "REFSEQ" "SYMBOL"
## [16] "TAIR"
## [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENTREZID" "EVIDENCE" "EVIDENCEALL"
## [7] "GENENAME" "GID" "GO" "GOALL" "ONTOLOGY" "ONTOLOGYALL"
## [13] "PMID" "REFSEQ" "SYMBOL"
keysReturns values (or keys) that can be expected for a given keytype. By default it will return the primary keys for the database.
## [1] "AT1G01010" "AT1G01020" "AT1G01030" "AT1G01040" "AT1G01050" "AT1G01060" "AT1G01070"
## [8] "AT1G01073" "AT1G01080" "AT1G01090"
## [1] "ANAC001" "NAC001" "NTL10" "ARV1" "NGA3" "ASU1" "ATDCL1" "CAF"
## [9] "DCL1" "EMB60"
## [1] "GO:0003700" "GO:0005634" "GO:0006355" "GO:0003674" "GO:0005739" "GO:0005783"
## [7] "GO:0005794" "GO:0006665" "GO:0009507" "GO:0016125"
## [1] "414734" "445455" "448843" "492297" "492308" "493648" "493649" "493650" "493651"
## [10] "493652"
## [1] "A1BG" "A1CF" "A2M" "A2ML1" "A3GALT2" "A4GALT" "A4GNT" "AAAS"
## [9] "AACS" "AADAC"
## [1] "GO:0000002" "GO:0000003" "GO:0000012" "GO:0000014" "GO:0000015" "GO:0000027"
## [7] "GO:0000028" "GO:0000030" "GO:0000033" "GO:0000035"
selectRetrieve the data as a data.frame based on parameters for selected keys, columns and keytype arguments.
## [1] "AT1G01010" "AT1G01020" "AT1G01030" "AT1G01040" "AT1G01050" "AT1G01060" "AT1G01070"
## [8] "AT1G01073" "AT1G01080" "AT1G01090"
## 'select()' returned 1:many mapping between keys and columns
## TAIR SYMBOL
## 1 AT1G01010 ANAC001
## 2 AT1G01010 NAC001
## 3 AT1G01010 NTL10
## 4 AT1G01020 ARV1
## 5 AT1G01030 NGA3
## 6 AT1G01040 ASU1
## 7 AT1G01040 ATDCL1
## 8 AT1G01040 CAF
## 9 AT1G01040 DCL1
## 10 AT1G01040 EMB60
## 11 AT1G01040 EMB76
## 12 AT1G01040 SIN1
## 13 AT1G01040 SUS1
## 14 AT1G01050 AtPPa1
## 15 AT1G01050 PPa1
## 16 AT1G01060 LHY
## 17 AT1G01060 LHY1
## 18 AT1G01070 UMAMIT28
## 19 AT1G01073 <NA>
## 20 AT1G01080 <NA>
## 21 AT1G01090 PDH-E1
myKeys <- c("CCA1", "LHY", "PRR7", "PRR9") # morning loop components
select(org.At.tair.db, keys = myKeys, columns = "ENTREZID", keytype = "SYMBOL")## 'select()' returned 1:1 mapping between keys and columns
## SYMBOL ENTREZID
## 1 CCA1 819296
## 2 LHY 839341
## 3 PRR7 831793
## 4 PRR9 819292
## [1] "ENSFCAG00000000001" "ENSFCAG00000000007" "ENSFCAG00000000015" "ENSFCAG00000000022"
## [5] "ENSFCAG00000000023" "ENSFCAG00000000024" "ENSFCAG00000000028" "ENSFCAG00000000029"
## [9] "ENSFCAG00000000030" "ENSFCAG00000000031"
## 'select()' returned 1:1 mapping between keys and columns
## ENSEMBL SYMBOL
## 1 ENSFCAG00000000001 INTS6L
## 2 ENSFCAG00000000007 HMGCR
## 3 ENSFCAG00000000015 CEP192
## 4 ENSFCAG00000000022 RASGRP1
## 5 ENSFCAG00000000023 GPR39
## 6 ENSFCAG00000000024 LYPD1
## 7 ENSFCAG00000000028 RCN3
## 8 ENSFCAG00000000029 APOO
## 9 ENSFCAG00000000030 CXHXorf58
## 10 ENSFCAG00000000031 CB1H4orf19
myKeys <- c("ASIP", "MC1R") # coat color patterns
select(org.Fc.eg.db, keys = myKeys, columns = c("ENSEMBL", "ENTREZID"), keytype = "SYMBOL")## 'select()' returned 1:1 mapping between keys and columns
## SYMBOL ENSEMBL ENTREZID
## 1 ASIP ENSFCAG00000011037 492297
## 2 MC1R ENSFCAG00000003798 493917
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-conda_cos6-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
##
## Matrix products: default
## BLAS/LAPACK: /home/ihsuan/miniconda3/envs/r4/lib/libopenblasp-r0.3.10.so
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8
## [4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets methods
## [9] base
##
## other attached packages:
## [1] AnnotationHub_2.20.1 BiocFileCache_1.12.1 dbplyr_1.4.4
## [4] org.At.tair.db_3.11.4 AnnotationDbi_1.50.3 IRanges_2.22.2
## [7] S4Vectors_0.26.1 Biobase_2.48.0 BiocGenerics_0.34.0
## [10] knitr_1.29
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.5 later_1.1.0.1
## [3] compiler_4.0.2 pillar_1.4.6
## [5] BiocManager_1.30.10 tools_4.0.2
## [7] digest_0.6.25 bit_4.0.4
## [9] RSQLite_2.2.0 evaluate_0.14
## [11] memoise_1.1.0 tibble_3.0.3
## [13] lifecycle_0.2.0 pkgconfig_2.0.3
## [15] rlang_0.4.7 shiny_1.5.0
## [17] DBI_1.1.0 curl_4.3
## [19] yaml_2.2.1 xfun_0.16
## [21] fastmap_1.0.1 httr_1.4.2
## [23] stringr_1.4.0 dplyr_1.0.1
## [25] rappdirs_0.3.1 generics_0.0.2
## [27] vctrs_0.3.2 tidyselect_1.1.0
## [29] bit64_4.0.2 data.table_1.13.0
## [31] glue_1.4.1 R6_2.4.1
## [33] rmarkdown_2.3 purrr_0.3.4
## [35] blob_1.2.1 magrittr_1.5
## [37] promises_1.1.1 htmltools_0.5.0
## [39] ellipsis_0.3.1 assertthat_0.2.1
## [41] xtable_1.8-4 mime_0.9
## [43] interactiveDisplayBase_1.26.3 httpuv_1.5.4
## [45] stringi_1.4.6 BiocVersion_3.11.1
## [47] crayon_1.3.4